H-tuple approach to evaluate statistical significance of biological sequence comparison with gaps.
نویسندگان
چکیده
We propose an approximate distribution for the gapped local score of a two sequence comparison. Our method stands on combining an adapted scoring scheme that includes the gaps and an approximate distribution of the ungapped local score of two independent sequences of i.i.d. random variables. The new scoring scheme is defined on h-tuples of the sequences, using the gapped global score. The influence of h and the accuracy of the p-value are numerically studied and compared with obtained p-value of BLAST. The numerical experiments emphasize that our approximate p-values outperform the BLAST ones, particularly for both simulated and real short sequences.
منابع مشابه
Statistical evaluation and comparison of a pairwise alignment algorithm that a priori assigns the number of gaps rather than employing gap penalties
MOTIVATION Although pairwise sequence alignment is essential in comparative genomic sequence analysis, it has proven difficult to precisely determine the gap penalties for a given pair of sequences. A common practice is to employ default penalty values. However, there are a number of problems associated with using gap penalties. First, alignment results can vary depending on the gap penalties, ...
متن کاملPhylogenetic Analysis of Beta-Glucanase Producing Actinomycetes Strain TBG-CH22 - A Comparison of Conventional and Molecular Morphometric Approach
Actinomycetes are inexhaustible producers of commercially valuable metabolites, are continually screened for beneficial compounds. The taxonomic and phylogenetic study of novel actinomycetes strains are mostly based on conventional methods and primary DNA structure of 16s rRNA. Although 16s rRNA sequence is well accepted in phylogeny studies, its secondary structures have not been widely used. ...
متن کاملComparison of MLP NN Approach with PCA and ICA for Extraction of Hidden Regulatory Signals in Biological Networks
The biologists now face with the masses of high dimensional datasets generated from various high-throughput technologies, which are outputs of complex inter-connected biological networks at different levels driven by a number of hidden regulatory signals. So far, many computational and statistical methods such as PCA and ICA have been employed for computing low-dimensional or hidden represe...
متن کامل2-tuple intuitionistic fuzzy linguistic aggregation operators in multiple attribute decision making
In this paper, we investigate the multiple attribute decisionmaking (MADM) problems with 2-tuple intuitionistic fuzzylinguistic information. Then, we utilize arithmetic and geometricoperations to develop some 2-tuple intuitionistic fuzzy linguisticaggregation operators. The prominent characteristic of theseproposed operators are studied. Then, we have utilized theseoperators to develop some app...
متن کاملSequence-specific sequence comparison using pairwise statistical significance.
There has been a deluge of biological sequence data in the public domain, which makes sequence comparison one of the most fundamental computational problems in bioinformatics. The biologists routinely use pairwise alignment programs to identify similar, or more specifically, related sequences (having common ancestor). It is a well-known fact that almost everything in bioinformatics depends on t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Statistical applications in genetics and molecular biology
دوره 6 شماره
صفحات -
تاریخ انتشار 2007